Dataset statistics
| Number of variables | 23 |
|---|---|
| Number of observations | 434605 |
| Missing cells | 1340989 |
| Missing cells (%) | 13.4% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 76.3 MiB |
| Average record size in memory | 184.0 B |
Variable types
| CAT | 14 |
|---|---|
| NUM | 9 |
Reproduction
| Analysis started | 2020-07-09 22:47:48.641904 |
|---|---|
| Analysis finished | 2020-07-09 22:50:53.880440 |
| Duration | 3 minutes and 5.24 seconds |
| Version | pandas-profiling v2.8.0 |
| Command line | pandas_profiling --config_file config.yaml [YOUR_FILE.csv] |
| Download configuration | config.yaml |
region has a high cardinality: 403 distinct values | High cardinality |
model has a high cardinality: 33313 distinct values | High cardinality |
vin has a high cardinality: 141403 distinct values | High cardinality |
state has a high cardinality: 51 distinct values | High cardinality |
price_scaled is highly correlated with price | High correlation |
price is highly correlated with price_scaled | High correlation |
manufacturer has 20442 (4.7%) missing values | Missing |
model has 6019 (1.4%) missing values | Missing |
condition has 186345 (42.9%) missing values | Missing |
cylinders has 165921 (38.2%) missing values | Missing |
odometer has 74292 (17.1%) missing values | Missing |
vin has 195579 (45.0%) missing values | Missing |
drive has 121473 (28.0%) missing values | Missing |
size has 295175 (67.9%) missing values | Missing |
type has 116543 (26.8%) missing values | Missing |
paint_color has 134689 (31.0%) missing values | Missing |
lat has 8227 (1.9%) missing values | Missing |
long has 8227 (1.9%) missing values | Missing |
price is highly skewed (γ1 = 158.2481522) | Skewed |
odometer is highly skewed (γ1 = 40.76855782) | Skewed |
price_scaled is highly skewed (γ1 = 158.2481522) | Skewed |
df_index has unique values | Unique |
id has unique values | Unique |
price has 30640 (7.1%) zeros | Zeros |
price_scaled has 30640 (7.1%) zeros | Zeros |
| Distinct count | 434605 |
|---|---|
| Unique (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 217933.09336754063 |
|---|---|
| Minimum | 0 |
| Maximum | 435848 |
| Zeros | 1 |
| Zeros (%) | < 0.1% |
| Memory size | 3.3 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 21778.2 |
| Q1 | 108955 |
| median | 217937 |
| Q3 | 326902 |
| 95-th percentile | 414069.8 |
| Maximum | 435848 |
| Range | 435848 |
| Interquartile range (IQR) | 217947 |
Descriptive statistics
| Standard deviation | 125826.7724 |
|---|---|
| Coefficient of variation (CV) | 0.5773642289 |
| Kurtosis | -1.200088501 |
| Mean | 217933.0934 |
| Median Absolute Deviation (MAD) | 108974 |
| Skewness | -0.000172233651 |
| Sum | 9.471481204e+10 |
| Variance | 1.583237665e+10 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 2047 | 1 | < 0.1% | |
| 427130 | 1 | < 0.1% | |
| 435326 | 1 | < 0.1% | |
| 87168 | 1 | < 0.1% | |
| 89217 | 1 | < 0.1% | |
| 83074 | 1 | < 0.1% | |
| 85123 | 1 | < 0.1% | |
| 95364 | 1 | < 0.1% | |
| 97413 | 1 | < 0.1% | |
| 91270 | 1 | < 0.1% | |
| Other values (434595) | 434595 | > 99.9% |
| Value | Count | Frequency (%) | |
| 0 | 1 | < 0.1% | |
| 1 | 1 | < 0.1% | |
| 2 | 1 | < 0.1% | |
| 3 | 1 | < 0.1% | |
| 4 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 435848 | 1 | < 0.1% | |
| 435847 | 1 | < 0.1% | |
| 435846 | 1 | < 0.1% | |
| 435845 | 1 | < 0.1% | |
| 435844 | 1 | < 0.1% |
| Distinct count | 434605 |
|---|---|
| Unique (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 7115953655.941774 |
|---|---|
| Minimum | 7096577274 |
| Maximum | 7121608239 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 3.3 MiB |
Quantile statistics
| Minimum | 7096577274 |
|---|---|
| 5-th percentile | 7107232076 |
| Q1 | 7112448897 |
| median | 7117092599 |
| Q3 | 7120091568 |
| 95-th percentile | 7121302169 |
| Maximum | 7121608239 |
| Range | 25030965 |
| Interquartile range (IQR) | 7642671 |
Descriptive statistics
| Standard deviation | 4591549.938 |
|---|---|
| Coefficient of variation (CV) | 0.0006452473077 |
| Kurtosis | -0.8476639242 |
| Mean | 7115953656 |
| Median Absolute Deviation (MAD) | 3479851 |
| Skewness | -0.6056791629 |
| Sum | 3.092629039e+15 |
| Variance | 2.108233083e+13 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 7119833087 | 1 | < 0.1% | |
| 7120462745 | 1 | < 0.1% | |
| 7117013867 | 1 | < 0.1% | |
| 7117853616 | 1 | < 0.1% | |
| 7117220776 | 1 | < 0.1% | |
| 7117580040 | 1 | < 0.1% | |
| 7121459071 | 1 | < 0.1% | |
| 7113159771 | 1 | < 0.1% | |
| 7116166063 | 1 | < 0.1% | |
| 7109929904 | 1 | < 0.1% | |
| Other values (434595) | 434595 | > 99.9% |
| Value | Count | Frequency (%) | |
| 7096577274 | 1 | < 0.1% | |
| 7104270832 | 1 | < 0.1% | |
| 7104271788 | 1 | < 0.1% | |
| 7104272529 | 1 | < 0.1% | |
| 7105598410 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 7121608239 | 1 | < 0.1% | |
| 7121607873 | 1 | < 0.1% | |
| 7121607787 | 1 | < 0.1% | |
| 7121607706 | 1 | < 0.1% | |
| 7121607368 | 1 | < 0.1% |
| Distinct count | 403 |
|---|---|
| Unique (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 3.3 MiB |
| springfield | 3588 |
|---|---|
| jacksonville | 3457 |
| columbus | 3280 |
| fayetteville | 3135 |
| richmond | 3042 |
| Other values (398) |
| Value | Count | Frequency (%) | |
| springfield | 3588 | 0.8% | |
| jacksonville | 3457 | 0.8% | |
| columbus | 3280 | 0.8% | |
| fayetteville | 3135 | 0.7% | |
| richmond | 3042 | 0.7% | |
| salem | 2989 | 0.7% | |
| portland | 2983 | 0.7% | |
| des moines | 2980 | 0.7% | |
| boise | 2979 | 0.7% | |
| fresno / madera | 2979 | 0.7% | |
| Other values (393) | 403193 | 92.8% |
Length
| Max length | 26 |
|---|---|
| Median length | 11 |
| Mean length | 11.4778339 |
| Min length | 4 |
| Distinct count | 16735 |
|---|---|
| Unique (%) | 3.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 135241.1402146777 |
|---|---|
| Minimum | 0 |
| Maximum | 3647256576 |
| Zeros | 30640 |
| Zeros (%) | 7.1% |
| Memory size | 3.3 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 4900 |
| median | 9995 |
| Q3 | 17988 |
| 95-th percentile | 34590 |
| Maximum | 3647256576 |
| Range | 3647256576 |
| Interquartile range (IQR) | 13088 |
Descriptive statistics
| Standard deviation | 16932750.66 |
|---|---|
| Coefficient of variation (CV) | 125.2041401 |
| Kurtosis | 26745.48215 |
| Mean | 135241.1402 |
| Median Absolute Deviation (MAD) | 6000 |
| Skewness | 158.2481522 |
| Sum | 5.877647574e+10 |
| Variance | 2.867180451e+14 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 0 | 30640 | 7.1% | |
| 6995 | 3963 | 0.9% | |
| 7995 | 3877 | 0.9% | |
| 4500 | 3802 | 0.9% | |
| 5995 | 3748 | 0.9% | |
| 3500 | 3716 | 0.9% | |
| 8995 | 3667 | 0.8% | |
| 5500 | 3482 | 0.8% | |
| 6500 | 3394 | 0.8% | |
| 9995 | 3368 | 0.8% | |
| Other values (16725) | 370948 | 85.4% |
| Value | Count | Frequency (%) | |
| 0 | 30640 | 7.1% | |
| 1 | 1788 | 0.4% | |
| 2 | 16 | < 0.1% | |
| 3 | 25 | < 0.1% | |
| 4 | 17 | < 0.1% |
| Value | Count | Frequency (%) | |
| 3647256576 | 1 | < 0.1% | |
| 3333333333 | 1 | < 0.1% | |
| 3268562261 | 1 | < 0.1% | |
| 2989542968 | 3 | < 0.1% | |
| 2525141468 | 1 | < 0.1% |
year
Real number (ℝ≥0)
| Distinct count | 72 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 1117 |
| Missing (%) | 0.3% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2010.07815210571 |
|---|---|
| Minimum | 1950.0 |
| Maximum | 2021.0 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 3.3 MiB |
Quantile statistics
| Minimum | 1950 |
|---|---|
| 5-th percentile | 1998 |
| Q1 | 2007 |
| median | 2012 |
| Q3 | 2015 |
| 95-th percentile | 2018 |
| Maximum | 2021 |
| Range | 71 |
| Interquartile range (IQR) | 8 |
Descriptive statistics
| Standard deviation | 8.422884279 |
|---|---|
| Coefficient of variation (CV) | 0.004190326764 |
| Kurtosis | 12.2923228 |
| Mean | 2010.078152 |
| Median Absolute Deviation (MAD) | 4 |
| Skewness | -2.860329603 |
| Sum | 871344758 |
| Variance | 70.94497958 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 2017 | 34592 | 8.0% | |
| 2015 | 32918 | 7.6% | |
| 2016 | 32096 | 7.4% | |
| 2014 | 31703 | 7.3% | |
| 2013 | 31434 | 7.2% | |
| 2012 | 29108 | 6.7% | |
| 2011 | 26532 | 6.1% | |
| 2008 | 22643 | 5.2% | |
| 2007 | 20457 | 4.7% | |
| 2018 | 20147 | 4.6% | |
| Other values (62) | 151858 | 34.9% |
| Value | Count | Frequency (%) | |
| 1950 | 120 | < 0.1% | |
| 1951 | 113 | < 0.1% | |
| 1952 | 88 | < 0.1% | |
| 1953 | 93 | < 0.1% | |
| 1954 | 94 | < 0.1% |
| Value | Count | Frequency (%) | |
| 2021 | 119 | < 0.1% | |
| 2020 | 2820 | 0.6% | |
| 2019 | 15531 | 3.6% | |
| 2018 | 20147 | 4.6% | |
| 2017 | 34592 | 8.0% |
| Distinct count | 42 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 20442 |
| Missing (%) | 4.7% |
| Memory size | 3.3 MiB |
| ford | |
|---|---|
| chevrolet | |
| toyota | 34332 |
| nissan | 23135 |
| honda | 22408 |
| Other values (37) |
| Value | Count | Frequency (%) | |
| ford | 77721 | 17.9% | |
| chevrolet | 62193 | 14.3% | |
| toyota | 34332 | 7.9% | |
| nissan | 23135 | 5.3% | |
| honda | 22408 | 5.2% | |
| ram | 20300 | 4.7% | |
| jeep | 19685 | 4.5% | |
| gmc | 18644 | 4.3% | |
| dodge | 14370 | 3.3% | |
| bmw | 12516 | 2.9% | |
| Other values (32) | 108859 | 25.0% | |
| (Missing) | 20442 | 4.7% |
Length
| Max length | 15 |
|---|---|
| Median length | 5 |
| Mean length | 5.639065358 |
| Min length | 3 |
| Distinct count | 33313 |
|---|---|
| Unique (%) | 7.8% |
| Missing | 6019 |
| Missing (%) | 1.4% |
| Memory size | 3.3 MiB |
| f-150 | 8513 |
|---|---|
| silverado 1500 | 5457 |
| 1500 | 4690 |
| silverado | 3962 |
| accord | 3303 |
| Other values (33308) |
| Value | Count | Frequency (%) | |
| f-150 | 8513 | 2.0% | |
| silverado 1500 | 5457 | 1.3% | |
| 1500 | 4690 | 1.1% | |
| silverado | 3962 | 0.9% | |
| accord | 3303 | 0.8% | |
| camry | 3272 | 0.8% | |
| altima | 3140 | 0.7% | |
| escape | 2915 | 0.7% | |
| grand cherokee | 2905 | 0.7% | |
| 2500 | 2873 | 0.7% | |
| Other values (33303) | 387556 | 89.2% | |
| (Missing) | 6019 | 1.4% |
Length
| Max length | 73 |
|---|---|
| Median length | 8 |
| Mean length | 10.54961172 |
| Min length | 1 |
| Distinct count | 6 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 186345 |
| Missing (%) | 42.9% |
| Memory size | 3.3 MiB |
| excellent | |
|---|---|
| good | |
| like new | |
| fair | 6895 |
| new | 1335 |
| Value | Count | Frequency (%) | |
| excellent | 118118 | 27.2% | |
| good | 93693 | 21.6% | |
| like new | 27535 | 6.3% | |
| fair | 6895 | 1.6% | |
| new | 1335 | 0.3% | |
| salvage | 684 | 0.2% | |
| (Missing) | 186345 | 42.9% |
Length
| Max length | 9 |
|---|---|
| Median length | 4 |
| Mean length | 5.185218762 |
| Min length | 3 |
| Distinct count | 8 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 165921 |
| Missing (%) | 38.2% |
| Memory size | 3.3 MiB |
| 6 cylinders | |
|---|---|
| 4 cylinders | |
| 8 cylinders | |
| 5 cylinders | 2419 |
| 10 cylinders | 1601 |
| Other values (3) | 1839 |
| Value | Count | Frequency (%) | |
| 6 cylinders | 95312 | 21.9% | |
| 4 cylinders | 85880 | 19.8% | |
| 8 cylinders | 81633 | 18.8% | |
| 5 cylinders | 2419 | 0.6% | |
| 10 cylinders | 1601 | 0.4% | |
| other | 1102 | 0.3% | |
| 3 cylinders | 532 | 0.1% | |
| 12 cylinders | 205 | < 0.1% | |
| (Missing) | 165921 | 38.2% |
Length
| Max length | 12 |
|---|---|
| Median length | 11 |
| Mean length | 7.934747644 |
| Min length | 3 |
fuel
Categorical
| Distinct count | 5 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 2989 |
| Missing (%) | 0.7% |
| Memory size | 3.3 MiB |
| gas | |
|---|---|
| diesel | 37802 |
| other | 13312 |
| hybrid | 4266 |
| electric | 1024 |
| Value | Count | Frequency (%) | |
| gas | 375212 | 86.3% | |
| diesel | 37802 | 8.7% | |
| other | 13312 | 3.1% | |
| hybrid | 4266 | 1.0% | |
| electric | 1024 | 0.2% | |
| (Missing) | 2989 | 0.7% |
Length
| Max length | 8 |
|---|---|
| Median length | 3 |
| Mean length | 3.363428861 |
| Min length | 3 |
| Distinct count | 108780 |
|---|---|
| Unique (%) | 30.2% |
| Missing | 74292 |
| Missing (%) | 17.1% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 98949.30538448515 |
|---|---|
| Minimum | 0.0 |
| Maximum | 10000000.0 |
| Zeros | 2099 |
| Zeros (%) | 0.5% |
| Memory size | 3.3 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 11077 |
| Q1 | 47454 |
| median | 91274 |
| Q3 | 134795 |
| 95-th percentile | 202964 |
| Maximum | 10000000 |
| Range | 10000000 |
| Interquartile range (IQR) | 87341 |
Descriptive statistics
| Standard deviation | 109739.8699 |
|---|---|
| Coefficient of variation (CV) | 1.109051443 |
| Kurtosis | 3180.005139 |
| Mean | 98949.30538 |
| Median Absolute Deviation (MAD) | 43710 |
| Skewness | 40.76855782 |
| Sum | 3.565272107e+10 |
| Variance | 1.204283905e+10 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 0 | 2099 | 0.5% | |
| 150000 | 1006 | 0.2% | |
| 130000 | 979 | 0.2% | |
| 140000 | 931 | 0.2% | |
| 120000 | 893 | 0.2% | |
| 160000 | 811 | 0.2% | |
| 170000 | 784 | 0.2% | |
| 200000 | 742 | 0.2% | |
| 180000 | 732 | 0.2% | |
| 125000 | 657 | 0.2% | |
| Other values (108770) | 350679 | 80.7% | |
| (Missing) | 74292 | 17.1% |
| Value | Count | Frequency (%) | |
| 0 | 2099 | 0.5% | |
| 1 | 351 | 0.1% | |
| 2 | 52 | < 0.1% | |
| 3 | 74 | < 0.1% | |
| 4 | 29 | < 0.1% |
| Value | Count | Frequency (%) | |
| 10000000 | 3 | < 0.1% | |
| 9999999 | 10 | < 0.1% | |
| 9855500 | 1 | < 0.1% | |
| 9208483 | 1 | < 0.1% | |
| 8148700 | 1 | < 0.1% |
title_status
Categorical
| Distinct count | 6 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 1805 |
| Missing (%) | 0.4% |
| Memory size | 3.3 MiB |
| clean | |
|---|---|
| rebuilt | 11631 |
| salvage | 5568 |
| lien | 2857 |
| missing | 663 |
| Value | Count | Frequency (%) | |
| clean | 411801 | 94.8% | |
| rebuilt | 11631 | 2.7% | |
| salvage | 5568 | 1.3% | |
| lien | 2857 | 0.7% | |
| missing | 663 | 0.2% | |
| parts only | 280 | 0.1% | |
| (Missing) | 1805 | 0.4% |
Length
| Max length | 10 |
|---|---|
| Median length | 5 |
| Mean length | 5.070539916 |
| Min length | 3 |
transmission
Categorical
| Distinct count | 3 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 2146 |
| Missing (%) | 0.5% |
| Memory size | 3.3 MiB |
| automatic | |
|---|---|
| manual | 28426 |
| other | 17454 |
| Value | Count | Frequency (%) | |
| automatic | 386579 | 88.9% | |
| manual | 28426 | 6.5% | |
| other | 17454 | 4.0% | |
| (Missing) | 2146 | 0.5% |
Length
| Max length | 9 |
|---|---|
| Median length | 9 |
| Mean length | 8.613511119 |
| Min length | 3 |
| Distinct count | 141403 |
|---|---|
| Unique (%) | 59.2% |
| Missing | 195579 |
| Missing (%) | 45.0% |
| Memory size | 3.3 MiB |
| WA1LAAF78HD040006 | 124 |
|---|---|
| 77777777777777777 | 75 |
| SALGS2KF6GA245355 | 68 |
| 1XPBDP9X5HD363709 | 64 |
| 1HSXLAPT67J411927 | 63 |
| Other values (141398) |
| Value | Count | Frequency (%) | |
| WA1LAAF78HD040006 | 124 | < 0.1% | |
| 77777777777777777 | 75 | < 0.1% | |
| SALGS2KF6GA245355 | 68 | < 0.1% | |
| 1XPBDP9X5HD363709 | 64 | < 0.1% | |
| 1HSXLAPT67J411927 | 63 | < 0.1% | |
| 1F66F5KY2G0A08512 | 59 | < 0.1% | |
| 1R9R1BF28JC828012 | 59 | < 0.1% | |
| WDCYC7DF1EX227210 | 56 | < 0.1% | |
| JM1NDAM74H0106020 | 52 | < 0.1% | |
| 5B4KP42Y013327793 | 52 | < 0.1% | |
| Other values (141393) | 238354 | 54.8% | |
| (Missing) | 195579 | 45.0% |
Length
| Max length | 18 |
|---|---|
| Median length | 17 |
| Mean length | 10.67266368 |
| Min length | 1 |
| Distinct count | 3 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 121473 |
| Missing (%) | 28.0% |
| Memory size | 3.3 MiB |
| 4wd | |
|---|---|
| fwd | |
| rwd |
| Value | Count | Frequency (%) | |
| 4wd | 142685 | 32.8% | |
| fwd | 111117 | 25.6% | |
| rwd | 59330 | 13.7% | |
| (Missing) | 121473 | 28.0% |
Length
| Max length | 3 |
|---|---|
| Median length | 3 |
| Mean length | 3 |
| Min length | 3 |
| Distinct count | 4 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 295175 |
| Missing (%) | 67.9% |
| Memory size | 3.3 MiB |
| full-size | |
|---|---|
| mid-size | |
| compact | |
| sub-compact | 3220 |
| Value | Count | Frequency (%) | |
| full-size | 75109 | 17.3% | |
| mid-size | 40227 | 9.3% | |
| compact | 20874 | 4.8% | |
| sub-compact | 3220 | 0.7% | |
| (Missing) | 295175 | 67.9% |
Length
| Max length | 11 |
|---|---|
| Median length | 3 |
| Mean length | 4.751118832 |
| Min length | 3 |
| Distinct count | 13 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 116543 |
| Missing (%) | 26.8% |
| Memory size | 3.3 MiB |
| SUV | |
|---|---|
| sedan | |
| pickup | |
| truck | |
| coupe | |
| Other values (8) |
| Value | Count | Frequency (%) | |
| SUV | 80146 | 18.4% | |
| sedan | 79733 | 18.3% | |
| pickup | 40869 | 9.4% | |
| truck | 39441 | 9.1% | |
| coupe | 17238 | 4.0% | |
| other | 12825 | 3.0% | |
| hatchback | 12395 | 2.9% | |
| van | 9962 | 2.3% | |
| wagon | 9878 | 2.3% | |
| convertible | 8498 | 2.0% | |
| Other values (3) | 7077 | 1.6% | |
| (Missing) | 116543 | 26.8% |
Length
| Max length | 11 |
|---|---|
| Median length | 5 |
| Mean length | 4.415752235 |
| Min length | 3 |
| Distinct count | 12 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 134689 |
| Missing (%) | 31.0% |
| Memory size | 3.3 MiB |
| white | |
|---|---|
| black | |
| silver | |
| blue | |
| grey | |
| Other values (7) |
| Value | Count | Frequency (%) | |
| white | 80052 | 18.4% | |
| black | 59512 | 13.7% | |
| silver | 44669 | 10.3% | |
| blue | 30380 | 7.0% | |
| grey | 30320 | 7.0% | |
| red | 29053 | 6.7% | |
| green | 7500 | 1.7% | |
| custom | 7177 | 1.7% | |
| brown | 6517 | 1.5% | |
| yellow | 2039 | 0.5% | |
| Other values (2) | 2697 | 0.6% | |
| (Missing) | 134689 | 31.0% |
Length
| Max length | 6 |
|---|---|
| Median length | 4 |
| Mean length | 4.237003716 |
| Min length | 3 |
| Distinct count | 51 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 3.3 MiB |
| ca | 46497 |
|---|---|
| fl | 31560 |
| tx | 24496 |
| or | 18051 |
| nc | 17026 |
| Other values (46) |
| Value | Count | Frequency (%) | |
| ca | 46497 | 10.7% | |
| fl | 31560 | 7.3% | |
| tx | 24496 | 5.6% | |
| or | 18051 | 4.2% | |
| nc | 17026 | 3.9% | |
| ny | 16840 | 3.9% | |
| oh | 16250 | 3.7% | |
| mi | 14737 | 3.4% | |
| wi | 14054 | 3.2% | |
| tn | 12583 | 2.9% | |
| Other values (41) | 222511 | 51.2% |
Length
| Max length | 2 |
|---|---|
| Median length | 2 |
| Mean length | 2 |
| Min length | 2 |
| Distinct count | 49177 |
|---|---|
| Unique (%) | 11.5% |
| Missing | 8227 |
| Missing (%) | 1.9% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 38.4037173583534 |
|---|---|
| Minimum | -83.1971 |
| Maximum | 79.6019 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 3.3 MiB |
Quantile statistics
| Minimum | -83.1971 |
|---|---|
| 5-th percentile | 28.11497 |
| Q1 | 34.2216 |
| median | 38.933 |
| Q3 | 42.4845 |
| 95-th percentile | 47.1991 |
| Maximum | 79.6019 |
| Range | 162.799 |
| Interquartile range (IQR) | 8.2629 |
Descriptive statistics
| Standard deviation | 6.038350579 |
|---|---|
| Coefficient of variation (CV) | 0.1572334918 |
| Kurtosis | 7.576887268 |
| Mean | 38.40371736 |
| Median Absolute Deviation (MAD) | 3.9186 |
| Skewness | -0.3766089785 |
| Sum | 16374500.2 |
| Variance | 36.46167771 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 33.7792 | 4777 | 1.1% | |
| 33.7865 | 3815 | 0.9% | |
| 40.4688 | 2147 | 0.5% | |
| 47.6961 | 2006 | 0.5% | |
| 47.7989 | 1985 | 0.5% | |
| 46.2348 | 1914 | 0.4% | |
| 47.6561 | 1437 | 0.3% | |
| 43.1824 | 1406 | 0.3% | |
| 41.1345 | 1155 | 0.3% | |
| 38.3826 | 1149 | 0.3% | |
| Other values (49167) | 404587 | 93.1% | |
| (Missing) | 8227 | 1.9% |
| Value | Count | Frequency (%) | |
| -83.1971 | 1 | < 0.1% | |
| -75.7603 | 1 | < 0.1% | |
| -70.7668 | 2 | < 0.1% | |
| -67.0709 | 1 | < 0.1% | |
| -63.0097 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 79.6019 | 1 | < 0.1% | |
| 78.4733 | 1 | < 0.1% | |
| 67.6702 | 1 | < 0.1% | |
| 67.0022 | 1 | < 0.1% | |
| 66.8639 | 1 | < 0.1% |
| Distinct count | 48385 |
|---|---|
| Unique (%) | 11.3% |
| Missing | 8227 |
| Missing (%) | 1.9% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | -94.96009014806579 |
|---|---|
| Minimum | -177.012 |
| Maximum | 173.675 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 3.3 MiB |
Quantile statistics
| Minimum | -177.012 |
|---|---|
| 5-th percentile | -122.608 |
| Q1 | -111.717 |
| median | -89.6767 |
| Q3 | -81.3976 |
| 95-th percentile | -73.08 |
| Maximum | 173.675 |
| Range | 350.687 |
| Interquartile range (IQR) | 30.3194 |
Descriptive statistics
| Standard deviation | 18.05832256 |
|---|---|
| Coefficient of variation (CV) | -0.1901674959 |
| Kurtosis | 1.181555964 |
| Mean | -94.96009015 |
| Median Absolute Deviation (MAD) | 9.8567 |
| Skewness | -0.6846858183 |
| Sum | -40488893.32 |
| Variance | 326.1030135 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| -84.4118 | 4777 | 1.1% | |
| -84.4454 | 3815 | 0.9% | |
| -74.2817 | 2147 | 0.5% | |
| -116.781 | 2029 | 0.5% | |
| -116.742 | 1985 | 0.5% | |
| -119.128 | 1914 | 0.4% | |
| -117.237 | 1436 | 0.3% | |
| -84.1122 | 1406 | 0.3% | |
| -96.2458 | 1155 | 0.3% | |
| -93.7734 | 1149 | 0.3% | |
| Other values (48375) | 404565 | 93.1% | |
| (Missing) | 8227 | 1.9% |
| Value | Count | Frequency (%) | |
| -177.012 | 1 | < 0.1% | |
| -170.288 | 1 | < 0.1% | |
| -161.875 | 3 | < 0.1% | |
| -160.097 | 1 | < 0.1% | |
| -160.059 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 173.675 | 1 | < 0.1% | |
| 139.388 | 1 | < 0.1% | |
| 139.348 | 1 | < 0.1% | |
| 133.77 | 1 | < 0.1% | |
| 127.724 | 1 | < 0.1% |
descwordcount
Real number (ℝ≥0)
| Distinct count | 3291 |
|---|---|
| Unique (%) | 0.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 372.01362156440905 |
|---|---|
| Minimum | 1 |
| Maximum | 11037 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 3.3 MiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 22 |
| Q1 | 68 |
| median | 228 |
| Q3 | 569 |
| 95-th percentile | 1112 |
| Maximum | 11037 |
| Range | 11036 |
| Interquartile range (IQR) | 501 |
Descriptive statistics
| Standard deviation | 441.2416145 |
|---|---|
| Coefficient of variation (CV) | 1.186089941 |
| Kurtosis | 23.04361022 |
| Mean | 372.0136216 |
| Median Absolute Deviation (MAD) | 185 |
| Skewness | 3.302504381 |
| Sum | 161678980 |
| Variance | 194694.1624 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 43 | 2245 | 0.5% | |
| 32 | 2204 | 0.5% | |
| 28 | 2197 | 0.5% | |
| 36 | 2197 | 0.5% | |
| 34 | 2174 | 0.5% | |
| 27 | 2152 | 0.5% | |
| 33 | 2147 | 0.5% | |
| 37 | 2140 | 0.5% | |
| 30 | 2137 | 0.5% | |
| 24 | 2135 | 0.5% | |
| Other values (3281) | 412877 | 95.0% |
| Value | Count | Frequency (%) | |
| 1 | 95 | < 0.1% | |
| 2 | 184 | < 0.1% | |
| 3 | 381 | 0.1% | |
| 4 | 356 | 0.1% | |
| 5 | 418 | 0.1% |
| Value | Count | Frequency (%) | |
| 11037 | 2 | < 0.1% | |
| 8616 | 11 | < 0.1% | |
| 8551 | 11 | < 0.1% | |
| 5386 | 2 | < 0.1% | |
| 5372 | 2 | < 0.1% |
| Distinct count | 16735 |
|---|---|
| Unique (%) | 3.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 3.708023754199345e-05 |
|---|---|
| Minimum | 0.0 |
| Maximum | 0.9999999999999999 |
| Zeros | 30640 |
| Zeros (%) | 7.1% |
| Memory size | 3.3 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 1.343475541e-06 |
| median | 2.740415924e-06 |
| Q3 | 4.931926127e-06 |
| 95-th percentile | 9.483840602e-06 |
| Maximum | 1 |
| Range | 1 |
| Interquartile range (IQR) | 3.588450587e-06 |
Descriptive statistics
| Standard deviation | 0.004642599256 |
|---|---|
| Coefficient of variation (CV) | 125.2041401 |
| Kurtosis | 26745.48215 |
| Mean | 3.708023754e-05 |
| Median Absolute Deviation (MAD) | 1.64507209e-06 |
| Skewness | 158.2481522 |
| Sum | 16.11525664 |
| Variance | 2.155372785e-05 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 0 | 30640 | 7.1% | |
| 1.917879879e-06 | 3963 | 0.9% | |
| 2.192058561e-06 | 3877 | 0.9% | |
| 1.233804068e-06 | 3802 | 0.9% | |
| 1.643701197e-06 | 3748 | 0.9% | |
| 9.596253861e-07 | 3716 | 0.9% | |
| 2.466237242e-06 | 3667 | 0.8% | |
| 1.50798275e-06 | 3482 | 0.8% | |
| 1.782161431e-06 | 3394 | 0.8% | |
| 2.740415924e-06 | 3368 | 0.8% | |
| Other values (16725) | 370948 | 85.4% |
| Value | Count | Frequency (%) | |
| 0 | 30640 | 7.1% | |
| 2.741786817e-10 | 1788 | 0.4% | |
| 5.483573635e-10 | 16 | < 0.1% | |
| 8.225360452e-10 | 25 | < 0.1% | |
| 1.096714727e-09 | 17 | < 0.1% |
| Value | Count | Frequency (%) | |
| 1 | 1 | < 0.1% | |
| 0.9139289391 | 1 | < 0.1% | |
| 0.8961700919 | 1 | < 0.1% | |
| 0.81966895 | 3 | < 0.1% | |
| 0.6923399589 | 1 | < 0.1% |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.First rows
| df_index | id | region | price | year | manufacturer | model | condition | cylinders | fuel | odometer | title_status | transmission | vin | drive | size | type | paint_color | state | lat | long | descwordcount | price_scaled | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 7119256118 | mohave county | 3495 | 2012.00 | jeep | patriot | like new | 4 cylinders | gas | nan | clean | automatic | NaN | NaN | NaN | NaN | silver | az | 34.46 | -114.27 | 45 | 0.00 |
| 1 | 1 | 7120880186 | oregon coast | 13750 | 2014.00 | bmw | 328i m-sport | good | NaN | gas | 76237.00 | clean | automatic | NaN | rwd | NaN | sedan | grey | or | 46.18 | -123.82 | 148 | 0.00 |
| 2 | 2 | 7115048251 | greenville / upstate | 2300 | 2001.00 | dodge | caravan | excellent | 6 cylinders | gas | 199000.00 | clean | automatic | NaN | NaN | NaN | NaN | NaN | sc | 34.94 | -81.97 | 46 | 0.00 |
| 3 | 3 | 7119250502 | mohave county | 9000 | 2004.00 | chevrolet | colorado ls | excellent | 5 cylinders | gas | 54000.00 | clean | automatic | 1GCCS196448191644 | rwd | mid-size | pickup | red | az | 34.48 | -114.27 | 65 | 0.00 |
| 4 | 4 | 7120433904 | maine | 0 | 2021.00 | NaN | Honda-Nissan-Kia-Ford-Hyundai-VW | NaN | NaN | other | nan | clean | other | NaN | NaN | NaN | NaN | NaN | me | 44.47 | -68.90 | 224 | 0.00 |
| 5 | 5 | 7120432569 | maine | 500 | 2010.00 | NaN | $500 DOWN PROGRAMS!!! | NaN | NaN | gas | nan | clean | automatic | NaN | NaN | NaN | NaN | NaN | me | 42.84 | -71.11 | 162 | 0.00 |
| 6 | 6 | 7120431378 | maine | 0 | 2014.00 | ford | f-150 | excellent | 8 cylinders | gas | 0.00 | clean | automatic | S7002 | 4wd | full-size | pickup | NaN | me | 42.77 | -71.24 | 1016 | 0.00 |
| 7 | 7 | 7120430837 | maine | 8500 | 2005.00 | ford | mustang convertible | excellent | 6 cylinders | gas | 62800.00 | clean | automatic | 1ZVHT84N355252184 | rwd | mid-size | convertible | silver | me | 44.21 | -69.79 | 113 | 0.00 |
| 8 | 8 | 7120857037 | oregon coast | 0 | 2012.00 | ram | 3500 | NaN | 6 cylinders | diesel | 116515.00 | clean | automatic | 3C63D3KL1CG155836 | 4wd | NaN | truck | NaN | or | 45.41 | -122.62 | 1049 | 0.00 |
| 9 | 9 | 7120844862 | oregon coast | 5950 | 2004.00 | honda | odyssey ex-l, reliable, e | NaN | 6 cylinders | gas | 102415.00 | rebuilt | automatic | 5FNRL18924B012679 | fwd | NaN | van | NaN | or | 45.58 | -122.68 | 733 | 0.00 |
Last rows
| df_index | id | region | price | year | manufacturer | model | condition | cylinders | fuel | odometer | title_status | transmission | vin | drive | size | type | paint_color | state | lat | long | descwordcount | price_scaled | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 434595 | 435839 | 7112254206 | rapid city / west SD | 29930 | 2016.00 | ram | 1500 | excellent | 8 cylinders | gas | 30383.00 | clean | automatic | 1C6RR7MT6GS265408 | 4wd | mid-size | truck | blue | sd | 44.08 | -103.23 | 360 | 0.00 |
| 434596 | 435840 | 7116829959 | helena | 24900 | 2017.00 | audi | q3 premium plus | excellent | 4 cylinders | gas | 27100.00 | clean | automatic | NaN | 4wd | full-size | SUV | silver | mt | 45.65 | -110.56 | 125 | 0.00 |
| 434597 | 435841 | 7109272290 | richmond | 9995 | 2008.00 | buick | enclave | excellent | 6 cylinders | gas | 145975.00 | clean | automatic | 5GAEV23778J148469 | 4wd | NaN | SUV | brown | va | nan | nan | 428 | 0.00 |
| 434598 | 435842 | 7119281941 | mohave county | 2495 | 2006.00 | lincoln | town car | NaN | 8 cylinders | gas | 126302.00 | clean | automatic | 1LNHM82V76Y636936 | rwd | NaN | sedan | white | az | 34.46 | -114.29 | 120 | 0.00 |
| 434599 | 435843 | 7115048966 | greenville / upstate | 46995 | 2019.00 | ford | f250 diesel powerstroke 4x4 | like new | 8 cylinders | diesel | 55000.00 | clean | automatic | NaN | 4wd | full-size | pickup | white | sc | 34.80 | -82.39 | 83 | 0.00 |
| 434600 | 435844 | 7119262300 | mohave county | 2500 | 2005.00 | ford | f150 | fair | NaN | gas | 282866.00 | clean | automatic | NaN | NaN | full-size | truck | white | az | 35.24 | -113.99 | 17 | 0.00 |
| 434601 | 435845 | 7112219717 | rapid city / west SD | 2700 | 2002.00 | toyota | camry | good | 6 cylinders | gas | 194000.00 | clean | automatic | NaN | fwd | NaN | NaN | blue | sd | 44.00 | -103.36 | 29 | 0.00 |
| 434602 | 435846 | 7120896708 | oregon coast | 2450 | 2001.00 | ford | focus | good | 4 cylinders | gas | 130484.00 | clean | automatic | NaN | rwd | compact | other | black | or | 45.53 | -123.09 | 66 | 0.00 |
| 434603 | 435847 | 7120885819 | oregon coast | 8995 | 2013.00 | mazda | mazda3 | NaN | NaN | gas | 93339.00 | clean | automatic | JM1BL1UPXD1758084 | fwd | NaN | sedan | NaN | or | 45.52 | -122.58 | 664 | 0.00 |
| 434604 | 435848 | 7112215161 | rapid city / west SD | 6577 | 2010.00 | dodge | grand caravan | NaN | NaN | gas | 148721.00 | clean | automatic | 2D4RN5DX0AR140668 | fwd | NaN | mini-van | blue | sd | 44.08 | -103.19 | 299 | 0.00 |